108

Binary Neural Architecture Search

12JO\KXMKTIK

෨ࢌሺܟǡ ߙሻ

܊

ሺෝܟǡ ොߙǡ ߚሻ

6GXKTZ

)NORJ

9KGXIN9VGIK

:GTMKTZ

VXUVGMGZOUT

)XUYYKTZXUV_

ߙ՚ ොߙ

*KIU[VRKJ

UVZOSO`GZOUT

*KIU[VRKJ

UVZOSO`GZOUT

FIGURE 4.12

The main framework of the Discrepant Child-Parent model. In orange, we show the critical

novelty of DCP-NAS, i.e., tangent propagation and decoupled optimization.

architectures with binarized weights and activations, which consider both real-valued archi-

tectures and binarized architectures.

4.4.3

Search Space

We search for computation cells as the building blocks of the final architecture. As in

[305, 307, 151] and Fig. 4.13, we construct the network with a predefined number of cells, and

each cell is a fully connected directed acyclic graph (DAG) G with N nodes. For simplicity,

we assume that each cell only takes the outputs of the two previous cells as input, and

each input node has pre-defined convolutional operations for preprocessing. Each node j is

obtained by

a(j) =



i<j

o(i,j)(a(i))

o(i,j)(ai) = w(i,j) ai,

(4.27)

where i is the dependent nodes of j with the constraints i < j to avoid cycles in a cell,

and aj is the output of the node j. w(i,j) denotes the weights of the convolution operation

between the i-th and j-th nodes, anddenotes the convolution operation. Each node is a

specific tensor like a feature map, and each directed edge (i, j) denotes an operation o(i,j)(.),

which is sampled from the following M = 8 operations:















FIGURE 4.13

The cell architecture for DCP-NAS. One cell includes 2 input nodes, 4 intermediate nodes,

and 14 edges.